Several large multi-national studies, that used EDF as a vehicle to exchange polygraphic data, have demonstrated that the specification is sufficiently simple and leaves enough flexibility to be easily applied in practice. Therefore, any implementation of EDF should simply follow exactly the official specification. However, some questions were regularly asked during these studies. Therefore, this FAQ list may be of some additional help.
Changing your EDF implementation according to any of these answers does not cause any incompatibility with EDF files or software that followed the official specs. Neither would you loose any of the original simplicity or flexibility. Some answers define EDF export more strictly than the official specs do. But EDF import (reader) software should accommodate all options that the official specs leave to the implementor. The list may give you an idea of these options.
EDF was designed in one day and we originally had in mind the exchange of
polygraphic recordings between mainly PC's in the old millennium. I suggest that you also abide to the three simple red-color
additional guidelines (at Q3, Q7 and Q10), so your EDF can be used all over
the world, between any machine and until the year 2084. If you want to use EDF also for the exchange of annotations,
events and automatic or manual analysis results, then it is probably wise to
adopt the green-color additional guidelines as well,
Here is the list of Questions and Answers:
Q1. For text fields in the header, what is the character set to use?
Export. EDF specs say that header information should be coded in
ASCII strings. The American Standard Code for Information Interchange (ASCII) is
7 bits wide and consists of control characters (byte values 0..31 and 127, for
instance for LineFeed, FormFeed, Carriage Return, Delete) and printable
characters (32..126). So, unless you are looking for trouble, use only printable
ASCII characters (32..126).
Import. Would an EDF file ask for trouble
(that is, contain control characters), EDF readers should not try to execute
these. Would an EDF file contain control characters or otherwise illegal
characters (127..255), warn the producer of that file.
Q2. Is the correct syntax for the date and time fields DD.MM.YY and
hh.mm.ss (D, M, Y, h, m, and s = [0..9]) as in "02.08.51"? I also saw "2.8.51"
and " 2. 8.51".
Export. The official specs say "The information
in the ASCII strings must be left-justified and filled out with spaces" and "8
ascii : startdate of recording (DD.MM.YY)" and "8 ascii : starttime of recording
(hh.mm.ss)". The format does not specify that D, M, Y, h, m and s = [0..9].
Therefore, some may argue that a space or even a blank (null character, 0) is
also allowed in the ASCII string. However, using spaces conflicts with the
"left-justification" spec and the null character is a 'forbidden' ASCII control
character (see Q1). So, my advice is to produce EDF date and time fields
containing only characters 0..9 and the period (.) as a separator, for example
"02.08.51".
Import. Still, EDF viewers should also accommodate
" 2. 8.51" and "2.8.51". And it is probably wise (and not much work) to have
them also accommodate different separators, like in 02:08-51 and 02/08'51.
Q3. How about the Y2K millennium problem?
In fact, it is a
centennial problem. An EDFdate of "02.08.51" in the "Startdate of Recording"
field could specify a recording from 2051, 1951, 1851, 1751, etc. First, it is
wise to put the full date in the "local recording identification" field (80 free
ASCII's), for instance in the format "Startdate 02-AUG-1951". This also avoids
any confusion between American and European date format.
Next, you can use 1985 as a clipping date. EDF was used for the first time in
1989. At that time, some older recordings from 1985 were also converted to EDF.
No EDF was recorded before 1985. Therefore you can use 85 as a clipping date in
your EDF software. Or in other words: if the EDFyear (yy=51 in the above
example) is equal to or larger than 85, then the real startdate is assumed to be
EDFdate + 1900. If the EDFyear is smaller than 85, the real date is assumed to
be EDFdate + 2000. In other words, in the EDF startdate,
yy=00-84 means yyyy=2000-2084 and yy=85-99 means yyyy=1985-1999.
This clipping date was discussed and adopted by the Siesta project in 1999 and is also
in my viewer PolyMan.
Q4. Are the "digital minimum" and "digital maximum" values hints or strict
limits?
The specs say "The digital minimum and maximum of each signal
should specify the extreme values that can occur in the data records." Note the
word "can". It is not necessary that these values actually DO occur. So take
safe values that you know the signal will not exceed, for instance the range of
the ADC. Note that "The physical (usually also physiological) minimum and
maximum of this signal should correspond to these digital extremes". This
correspondence is necessary for assessing gain and offset of the signal.
Q5. Why not always use -32767 for "digital minimum" and +32767 for
"digital maximum"?
Export. It is formally correct EDF as long as
the purpose (specification of offset and amplification of the signal) is met
with sufficient accuracy.
Q6. Which is the preferred method of encoding a channel, where gain =
(physical maximum - physical minimum) /(digital maximum - digital minimum) is
negative? Using physical minimum > physical maximum or using digital minimum
> digital maximum?
Export. The specs say "The digital minimum
and maximum of each signal should specify the extreme values that can occur in
the data records. These often are the extreme output values of the A/D
converter. The physical (usually also physiological) minimum and maximum of this
signal should correspond to these digital extremes...". So, just reading this
chronologically, first specify digital maximum > digital minimum, then derive
the 'corresponding' physical minimum and physical maximum which in this case
leads to physical minimum > physical maximum.
Import. Import
routines should allow both alternatives because it is not much programming (just
get gain and offset) and because someone else may have an interpretation
different from mine.
Q7. Are "+22", ".5", "1E3" valid syntax's of number fields?
Yes,
as long as the numbers are left-justified in the ASCII strings and filled out
with spaces. "22" and "-1.23E-4" are also OK. In the latter example, better
accuracy can be obtained by using a standardized
dimension prefix. So use "-123.456" and the dimension
"uV " rather than "-1.23E-4" and the dimension
"V ". In accordance with the examples in the
original publication and in order to avoid Continental / (American) English
confusions, never use a comma "," for a digit grouping
symbol, nor for a decimal separator. When a decimal separator is required, use a
dot (".") only.
Q8. How to specify signals that can not be calibrated (like an oral-nasal
thermocouple for respiration flow, or an event button).
Export.
Just set the physical dimension to some meaningless value like
" ". Put appropriate values in the
digital minimum/maximum fields and dummy values in physical minimum/maximum
fields. Do not make physical minimum = physical maximum because that may
result in 'division by zero' errors in programs, that compute the signal gain
from these values.
Import. Some EDF files may not contain valid
numbers in the digital/physical minimum/maximum fields, especially when signals
were not calibrated. It should still be possible to read these signals, be they
uncalibrated.
Q9. Do non-integer sampling frequencies (like 1/30 Hz) cause problems?
Not necessarily. Good viewers will count samples and compare these with
"number of samples in a datarecord" and in this way count how many datarecords
have been passed (and consequently how many "duration's of a datarecord").
Because this is all integer computation, there are no round-off errors! This is
why EDF recommends the "duration of a datarecord" to be an integer number of
seconds. In the 1/30 Hz example, "duration of a datarecord" and "number of
samples in a datarecord" can be 30 and 1, respectively. Or 3600 and 120,
respectively.
However, if a sampling frequency is
999.98Hz (for instance due to small inaccuracy of the ADC clock), 'integer EDF'
would be possible using datarecords of 50000s and 49999 samples of each signal
in this datarecord. Even if only one signal is in the file, there would be more
than 61440 bytes in a datarecord. The official specs say that in that case the
duration should be a float value less than 1s. This will inevitably cause a
small round-off error in the timing.
In an even more
extreme example, like 999.999998Hz, it is better to assume it to be 1000Hz (so
for instance 1000 samples in a 1s datarecord) because this causes a smaller
error than a non-integer duration of the datarecord would.
Q10. Are the 2-byte samples in the data blocks written
in big or little endian?
Indeed, the byte order for the integer
datasamples is different in (a.o.) Intel and Motorola processors. In the first
EDF application, described in the original
article, the Intel little endian byte order was applied (see section
Results) because we had mainly PC's in mind. That is, the lower-significance
byte was stored before (at lower address than) the higher-significance byte: the
integer samples were stored "little-end-first". At present (March 1999) probably
all EDF files in the world are in the little endian format and certainly all EDF
viewers expect so. Let us keep it that way and ask the Motorola users to force
the little endian in their routines. Some Sun users already did so in Matlab.
So, EDF samples should be stored in the little endian
format (the default format in PC applications).
Q11. What are common errors in EDF files?
Q12. What are common errors in EDF viewers?
Q13. Do the mentioned EDF-supporting
companies really provide correct EDF?
Since most companies recently
started doing EDF, I think it is not fair to tell now. Not all companies provide
perfect EDF. So, if you plan to buy EDF equipment, check its EDF files using the
Alpo Värri program CHECKREC
and one of the free
EDF viewers. Or mail me a file and I will do
a rough check (this offer is valid until further notice). Tell the supplier to
correct any errors. Next year I would like to start, with your cooperation,
evaluating EDF companies and list the results on this site.
Q14. How to encode free-text
annotations?
Simply assign one of the signals
in the file by giving it the label "ANNOTATION". Let the 'samples' of this
signal in the datarecords store time-stamped annotations in standard ASCII
format (see Q1) as follows. If the technician switches off the lights on 17
March 1999 at 23:54:12.2hr, this is stored as the 30-bytes ASCII string
'19990317235412.200@Lights off@' without the single quotes. Note that the time
stamp has the order year-month-day-hour-minute-second.milliseconds and unknown
characters are set to 0 (byte value 48). The .milliseconds may be omitted. Each
signal sample offers two bytes of space and each byte contains one of the
standard-ASCII byte values. Each annotation is stored byte-by-byte without
changing their order. Unused bytes, in between the annotations, get byte value
0.
Typically, annotations
denote events that occurred in the datarecord in which they are written. Only if
this is not possible because they are too many or too large, they can overflow
in earlier and/or later datarecords. Annotations must be in the file in
chronological order. If two annotations have identical time stamps, their order
is arbitrary. Any annotations that denote events that occurred in preceding
datarecords must immediately follow the preceding annotation: there must be no
0-valued bytes between them. Any annotations that denote events that occurred in
later datarecords must immediately precede the following annotation: there must
be no 0-valued bytes between them.
Choose the sampling frequency of the ANNOTATION
signal high enough to accommodate the amount of text bytes to be stored. For
instance, a sampling frequency of 5Hz accommodates 5*60*2=600 characters per
minute. Since time stamp and text wrapper (yyyymmddhhmmss.mmm@@) take 20
characters, this sampling frequency allows an average of 6 lines of 80
characters or 10 lines of 40 characters per minute. This shows that an
annotation sampling frequency of 5Hz is quite sufficient for most
neurophysiological applications. Apparently, the annotations do not occupy much
space when compared to the recorded signals.
The main idea of this scheme was described by
Maarten van de Velde et al in the article "Digital archival and exchange of
events in a simple format for polygraphic recordings with application in event
related potential studies". This
article describes still more possibilities for the encoding of events and
was published in the J of Electroenc Clin Neurophysiol 106, 1998:547-551.
They suggest to use the ISO-8859-1 (Latin1) extended
ASCII coding scheme. But I prefer to use only the genuinely standard ASCII
characters that are also allowed in the header (see Q1). I suggest you do the
same because it avoids the ISO-8859 alphabet discussions between countries. The
only languages that can comfortably be written with the repertoire of standard ASCII are Latin,
Swahili, Hawaiian and American English. This is OK because EDF should support
international exchange of data and it does not make much sense to send Arabic or
Portugese text to Chinese or Finnish colleagues. In order to disprove any
suggestion that EDF is for Europeans only, I would suggest to use American
English rather than Latin.
Q15. How to encode physiological
events such as apneas and leg movements?
Simply use the annotations encoding in an (possible additional)
ANNOTATION signal as described at Q14. Use standard texts 'Apnea onset', 'Apnea
end', 'Leg movement onset' and Leg movement end'. For instance, an apnea with a
duration of 35s and starting at 03:47:17.900 would be encoded as two
events:
00000000034717.900@Apnea onset@ and
00000000034752.900@Apnea end@. In this way, information about the onset and
duration of the event can be recovered from the annotations. It is important to
use standard texts, so automatic processing of the events is possible.
Q16. How to store
analysis results in EDF?
Any automatic or
manual analysis result that is again a single or multi-channel timeseries (for
instance a deltaplot together with an automatically scored hypnogram) can easily
be stored in an EDF file. Some experience and discussions in the COMAC-BME and
Siesta groups resulted in the
following guidelines
1. The analysis
result should be stored in a separate EDF file. In order to reliably link the
analysis file to the originally recorded file, the analysis program
should:
- make the name part of the two filenames
identical
- make the extension part of the two
filenames different.
- copy the patient-id line
(80 characters) from the header of the recorded file to the header of the
analysis file.
- preferrably start the analysis
at the exact beginning of the originally recorded file and let the program
simply copy startdate and starttime from the originally recorded file into the
analysis file. If there are good arguments not to start the analysis at the
start of the recording, then at least make the timing of the analysis file (that
is startdate, starttime, number and duration of the datarecords) correspond to
the timing of the recorded file. So, if you analyse a portion from 23:05:00 till
23:25:00 of the original recording that was made on August 2, 1999, then the
analysis file should have startdate 02.08.99 and starttime 23.05.00. Number and
duration of the analysis-date records can be chosen according to the EDF
guidelines and the applied smoothing windows. If, for example, your
analysis-data records each refer to 30s of the recording, the mentioned analysis
file should have 40 of these datarecords.
In this way it is clear that both files refer
to one time period in one person's life. Some EDF viewers (like PolyMan)
are capable of showing the two (or more) files time-synchronized on one
screen.
2. Apply suitable scaling factors
in such a way that a large part of the available range of -32767 till 32767 for
the values of the analysis results is used. Put these scaling factors in the
header (digital and physical minimum and maximum) of the analysis file. If
necessary, the scaling factor can be adapted to the dynamic range of the
analysis result, after the analysis was done.
2b. If solution 2 is really really impossible
because the usefull dynamic range of the analysis result is too large, but
only then, apply the standardized logarithmic
transformation to store floating point values in EDF. However, be aware that
viewers, that do not yet accomodate the appropriate exponential inverse scaling,
can only show the results on a logarithmic scale. So really try solution
2 first!
3. If the analysis
contains a hypnogram, sleep stages W,1,2,3,4,R,M should be coded in the
datablocks as the integer numbers 0,1,2,3,4,5,6 respectively. Unscored epochs
should be coded as the integer number 9.
4. Automatically document the analysis principle and
parameters in the Recording-id, Label, Transducer type, Physical dimension and
Prefiltering fields in the header of the analysis file.
Q17. Should the starttime of the recording be in local
time or for instance in Greenwich Mean Time?
Everybody until now (2000) uses local time, so I suggest that you
do the same.
Q18. Are there any standard texts for the EDF ascii
fields?
With the help of several colleagues,
I constructed some standard
texts.
These texts comply
with the official specs and therefore do not cause any incompatibility with EDF
software. EDF import (reader/browser, analysis) software should abide by the
official specs and not depend on these standard texts. However, if the software
detects that the imported file does contain standard texts, it can automatically
recognize labels and dimensions.
Using standard texts is not required for EDF
compatibility. However, they reduce the probability for errors and avoid the
need for user input in some types of automatic analysis programs. Therefore, it
is wise to use the standard texts wherever possible.
Q19. Can EDF store hypnograms?
Yes, of course. Simply consider that a hypnogram is a single
signal of 1 sample per 30s (or in some labs per 20s). For instance, all 1770
hypnograms made in the Siesta
project are stored in an EDF file. The sleep stages W, 1, 2, 3, 4, R, MT and
'unscored' were coded in the EDF files as integer numbers 0, 1, 2, 3, 4, 5, 6
and 9, respectively. The EDF
recording of an OSAS patient contains not only the polygraphic signals but
also the hypnogram as one of the signals.